Value Function Approximation in Zero-Sum Markov Games
نویسندگان
چکیده
This paper investigates value function approximation in the context of zero-sum Markov games, which can be viewed as a generalization of the Markov decision process (MDP) framework to the two-agent case. We generalize error bounds from MDPs to Markov games and describe generalizations of reinforcement learning algorithms to Markov games. We present a generalization of the optimal stopping problem to a two-player simultaneous move Markov game. For this special problem, we provide stronger bounds and can guarantee convergence for LSTD and temporal difference learning with linear value function approximation. We demonstrate the viability of value function approximation for Markov games by using the Least squares policy iteration (LSPI) algorithm to learn good policies for a soccer domain and a flow control problem.
منابع مشابه
A TRANSITION FROM TWO-PERSON ZERO-SUM GAMES TO COOPERATIVE GAMES WITH FUZZY PAYOFFS
In this paper, we deal with games with fuzzy payoffs. We proved that players who are playing a zero-sum game with fuzzy payoffs against Nature are able to increase their joint payoff, and hence their individual payoffs by cooperating. It is shown that, a cooperative game with the fuzzy characteristic function can be constructed via the optimal game values of the zero-sum games with fuzzy payoff...
متن کاملApproximate Dynamic Programming for Two-Player Zero-Sum Markov Games
This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in Lp-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we ca...
متن کاملNumerical Approximations for Nonzero-Sum Stochastic Differential Games
The Markov chain approximation method is a widely used, and efficient family of methods for the numerical solution a large part of stochastic control problems in continuous time for reflected-jump-diffusion-type models. It converges under broad conditions, and there are good algorithms for solving the numerical approximations if the dimension is not too high. It has been extended to zero-sum st...
متن کاملSufficient Condition on the Existence of Saddle Points on Markov Games
We study the sufficient conditions for the existence of a saddle point of timedependent discrete Markov zero-sum game up to a given stopping time. The stopping time is allowed to take either a finite or an infinite non-negative random variable with its associated objective function being well-defined. The result enables us to show the existence of the saddle points of discrete games constructed...
متن کاملThresholded Rewards: Acting Optimally in Timed, Zero-Sum Games
In timed, zero-sum games, the goal is to maximize the probability of winning, which is not necessarily the same as maximizing our expected reward. We consider cumulative intermediate reward to be the difference between our score and our opponent’s score; the “true” reward of a win, loss, or tie is determined at the end of a game by applying a threshold function to the cumulative intermediate re...
متن کامل